Grafana Tempo provides distributed tracing data through its REST API. Each tool maps directly to a specific Tempo API endpoint.

Assume every application provides tempo traces.

## API Endpoints and Tool Mapping

1. **Trace Search** (GET /api/search)
   - `tempo_search_traces_by_query`: Use with 'q' parameter for TraceQL queries
   - `tempo_search_traces_by_tags`: Use with 'tags' parameter for logfmt queries

2. **Trace Details** (GET /api/v2/traces/{trace_id})
   - `tempo_query_trace_by_id`: Retrieve full trace data

3. **Tag Discovery**
   - `tempo_search_tag_names` (GET /api/v2/search/tags): List available tags
   - `tempo_search_tag_values` (GET /api/v2/search/tag/{tag}/values): Get values for a tag

4. **TraceQL Metrics**
   - `tempo_query_metrics_instant` (GET /api/metrics/query): Single value computation
   - `tempo_query_metrics_range` (GET /api/metrics/query_range): Time series data

## Usage Workflow

### 1. Discovering Available Data
Start by understanding what tags and values exist:
- Use `tempo_search_tag_names` to discover available tags
- Use `tempo_search_tag_values` to see all values for a specific tag (e.g., service names)

### 2. Searching for Traces

**TraceQL Search (recommended):**
Use `tempo_search_traces_by_query` with TraceQL syntax for powerful filtering.

**TraceQL Capabilities:**
TraceQL can select traces based on the following:
- **Span and resource attributes** - Filter by any attribute on spans or resources
- **Timing and duration** - Filter by trace/span duration
- **Basic aggregates** - Use aggregate functions to compute values across spans

**Supported Aggregate Functions:**
- `count()` - Count the number of spans matching the criteria
- `avg(attribute)` - Calculate average of a numeric attribute across spans
- `min(attribute)` - Find minimum value of a numeric attribute
- `max(attribute)` - Find maximum value of a numeric attribute
- `sum(attribute)` - Sum values of a numeric attribute across spans

**Aggregate Function Usage:**
Aggregates are used with the pipe operator `|` to filter traces based on computed values across their spans.

**Aggregate Examples:**
- `{ span.http.status_code = 200 } | count() > 3` - Find traces with more than 3 spans having HTTP 200 status
- `{ } | sum(span.bytesProcessed) > 1000000000` - Find traces where total processed bytes exceed 1 GB
- `{ status = error } | by(resource.service.name) | count() > 1` - Find services with more than 1 error

**Select Function:**
- `{ status = error } | select(span.http.status_code, span.http.url)` - Select specific attributes from error spans

**TraceQL Query Structure:**
TraceQL queries follow the pattern: `{span-selectors} | aggregate`

**TraceQL Query Examples (from official docs):**

1. **Find traces of a specific operation:**
   ```
   {resource.service.name = "frontend" && name = "POST /api/orders"}
   ```
   ```
   {
     resource.service.namespace = "ecommerce" &&
     resource.service.name = "frontend" &&
     resource.deployment.environment = "production" &&
     name = "POST /api/orders"
   }
   ```

2. **Find traces with a particular outcome:**
   ```
   {
     resource.service.name="frontend" &&
     name = "POST /api/orders" &&
     status = error
   }
   ```
   ```
   {
     resource.service.name="frontend" &&
     name = "POST /api/orders" &&
     span.http.status_code >= 500
   }
   ```

3. **Find traces with a particular behavior:**
   ```
   {span.service.name="frontend" && name = "GET /api/products/{id}"} && {span.db.system="postgresql"}
   ```

4. **Find traces across environments:**
   ```
   { resource.deployment.environment = "production" } && { resource.deployment.environment = "staging" }
   ```

5. **Structural operators (advanced):**
   ```
   { resource.service.name="frontend" } >> { status = error }  # Frontend spans followed by errors
   { } !< { resource.service.name = "productcatalogservice" }  # Traces without productcatalog as child
   { resource.service.name = "productcatalogservice" } ~ { resource.service.name="frontend" }  # Sibling spans
   ```

6. **Additional operator examples:**
   ```
   { span.http.method = "GET" && status = ok } && { span.http.method = "DELETE" && status != ok }  # && for multiple conditions
   ```

   ```
   { resource.deployment.environment =~ "prod-.*" && span.http.status_code = 200 }  # =~ regex match
   { span.http.method =~ "DELETE|GET" }  # Regex match multiple values
   { trace:rootName !~ ".*perf.*" }  # !~ negated regex
   { resource.cloud.region = "us-east-1" } || { resource.cloud.region = "us-west-1" }  # || OR operator
   ```

   ```
   { span.http.status_code >= 400 && span.http.status_code < 500 }  # Client errors (4xx)
   { span.http.url = "/path/of/api" } >> { span.db.name = "db-shard-001" }  # >> descendant
   { span.http.status_code = 200 } | select(resource.service.name)  # Select specific attributes
   ```

**Common Attributes to Query:**
- `resource.service.name` - Service name
- `resource.k8s.*` - Kubernetes metadata (pod.name, namespace.name, deployment.name, etc.)
- `span.http.*` - HTTP attributes (status_code, method, route, url, etc.)
- `name` - Span name
- `status` - Span status (error, ok)
- `duration` - Span duration
- `kind` - Span kind (server, client, producer, consumer, internal)

**Tag-based Search (legacy):**
Use `tempo_search_traces_by_tags` with logfmt format when you need min/max duration filters:
- Example: `service.name="api" http.status_code="500"`
- Supports `min_duration` and `max_duration` parameters

### 3. Analyzing Specific Traces
When you have trace IDs from search results:
- Use `tempo_query_trace_by_id` to get full trace details
- Examine spans for errors, slow operations, and bottlenecks

### 4. Computing Metrics from Traces
**TraceQL metrics** compute aggregated metrics from your trace data, helping you answer critical questions like:
- How many database calls across all systems are downstream of your application?
- What services beneath a given endpoint are failing?
- What services beneath an endpoint are slow?

TraceQL metrics parse your traces in aggregate to provide RED (Rate, Error, Duration) metrics from trace data.

**Supported Functions:**
- `rate` - Calculate rate of spans/traces
- `count_over_time` - Count spans/traces over time
- `sum_over_time` - Sum span attributes
- `avg_over_time` - Average of span attributes
- `max_over_time` - Maximum value over time
- `min_over_time` - Minimum value over time
- `quantile_over_time` - Calculate quantiles
- `histogram_over_time` - Generate histogram data
- `compare` - Compare metrics between time periods

**Modifiers:**
- `topk` - Return top N results
- `bottomk` - Return bottom N results

**TraceQL Metrics Query Examples:**

1. **rate** - Calculate error rate by service and HTTP route:
   ```
   { resource.service.name = "foo" && status = error } | rate() by (span.http.route)
   ```

2. **count_over_time** - Count spans by HTTP status code:
   ```
   { name = "GET /:endpoint" } | count_over_time() by (span.http.status_code)
   ```

3. **sum_over_time** - Sum HTTP response sizes by service:
   ```
   { name = "GET /:endpoint" } | sum_over_time(span.http.response.size) by (resource.service.name)
   ```

4. **avg_over_time** - Average duration by HTTP status code:
   ```
   { name = "GET /:endpoint" } | avg_over_time(duration) by (span.http.status_code)
   ```

5. **max_over_time** - Maximum response size by HTTP target:
   ```
   { name = "GET /:endpoint" } | max_over_time(span.http.response.size) by (span.http.target)
   ```

6. **min_over_time** - Minimum duration by HTTP target:
   ```
   { name = "GET /:endpoint" } | min_over_time(duration) by (span.http.target)
   ```

7. **quantile_over_time** - Calculate multiple percentiles (99th, 90th, 50th) with exemplars:
   ```
   { span:name = "GET /:endpoint" } | quantile_over_time(duration, .99, .9, .5) by (span.http.target) with (exemplars=true)
   ```

8. **histogram_over_time** - Build duration histogram grouped by custom attribute:
   ```
   { name = "GET /:endpoint" } | histogram_over_time(duration) by (span.foo)
   ```

9. **compare** - Compare error spans against baseline (10 attributes):
   ```
   { resource.service.name="a" && span.http.path="/myapi" } | compare({status=error}, 10)
   ```

10. **Using topk modifier** - Find top 10 endpoints by request rate:
   ```
   { resource.service.name = "foo" } | rate() by (span.http.url) | topk(10)
   ```

**Choosing Between Instant and Range Queries:**

**Instant Metrics** (`tempo_query_metrics_instant`) - Returns a single aggregated value for the entire time range. Use this when:
- You need a total count or sum across the whole period
- You want a single metric value (e.g., total error count, average latency)
- You don't need to see how the metric changes over time
- You're computing a KPI or summary statistic

**Time Series Metrics** (`tempo_query_metrics_range`) - Returns values at regular intervals controlled by the 'step' parameter. Use this when:
- You need to graph metrics over time or analyze trends
- You want to see patterns, spikes, or changes in metrics
- You're troubleshooting time-based issues
- You need to correlate metrics with specific time periods

## Special workflow for performance issues
When investigating performance issues in kubernetes via traces, call tempo_fetch_traces_comparative_sample. This tool provides comprehensive analysis for identifying patterns.

## Important Notes
- TraceQL is the modern query language - prefer it over tag-based search
- TraceQL metrics are computed from trace data, not traditional Prometheus metrics
- TraceQL metrics is an experimental feature that computes RED (Rate, Error, Duration) metrics from trace data
- Common attributes to use in queries: resource.service.name, span.http.route, span.http.status_code, span.http.target, status, name, duration
- All timestamps can be Unix epoch seconds or RFC3339 format
- Use time filters (start/end) to improve query performance
- To get information about Kubernetes resources try these first: resource.service.name, resource.k8s.pod.name, resource.k8s.namespace.name, resource.k8s.deployment.name, resource.k8s.node.name, resource.k8s.container.name
- TraceQL and TraceQL metrics language are complex. If you get empty data, try to simplify your query and try again!
- IMPORTANT: TraceQL is not the same as 'TraceQL metrics' - Make sure you use the correct syntax and functions
